Binary document intelligence. 32× smaller than float32 RAG. 48× smaller than HNSW. 96× with BGE-base PCA.
Real numbers, real data, reproducible. See benchmark below.
NodeMind replaces float32 vector indexes with compact binary fingerprints. Instead of storing thousands of bytes per chunk (BGE-M3 float32 = 4,096 bytes), it stores 128 bytes. Retrieval uses Multi-Index Hashing (MIH) — pure integer arithmetic, no GPU, no external vector database.
Upload a PDF → get a 64 MB index instead of a 2 GB one. Query it on any CPU.
Live demo: nodemind.space
Tested on 500,000 chunks from a mixed real-world corpus: Wikipedia, arXiv papers, and Project Gutenberg books.
Embedded with BGE-M3 (1024-dim) on an NVIDIA A40 GPU.
Recall measured against exact cosine top-k ground truth on float32 embeddings.
| Metric | NodeMind MIH |
|---|---|
| Recall@1 | 0.999 |
| Recall@3 | 0.999 |
| Recall@5 | 1.000 |
| Recall@10 | 1.000 |
| Recall@20 | 1.000 |
| MRR@10 | 0.9992 |
1,000 queries sampled from the same corpus. Ground truth = exact cosine top-20 on float32.
| Metric | 768-bit | 256-bit (PCA) |
|---|---|---|
| Recall@1 | 0.999 | 1.000 |
| Recall@5 | 1.000 | 1.000 |
| Recall@10 | 1.000 | 1.000 |
| MRR@10 | 0.9995 | 1.000 |
Same 500K corpus, same evaluation protocol.
| Index | Size | Compression |
|---|---|---|
| NodeMind BGE-M3 (1024-bit) | 64 MB | — |
| Float32 RAG (BGE-M3) | 2,048 MB | 32× smaller |
| HNSW index (float32 × 1.5×) | 3,072 MB | 48× smaller |
| NodeMind BGE-base (256-bit PCA) | 16 MB | 96× vs float32 |
Index only — document text stored separately in all systems equally.
| Source | Volume | Description |
|---|---|---|
| Wikipedia (Simple English) | ~100 MB raw text | General knowledge articles |
| arXiv papers | ~40 MB raw text | Computer science & ML abstracts |
| Project Gutenberg books | ~28 MB raw text | Public domain prose |
| Total raw corpus | ~168 MB | 642,939 paragraphs |
| Chunks | 500,000 | 400 words/chunk, 50-word overlap |
| Embedding model | BGE-M3 | 1024-dim, unit-normalised float32 |
| Hardware | NVIDIA A40 (46GB) | 42.5 min to embed 500K chunks |
All indexes were generated from the same 500,000 chunks. Download NodeMind + float32 RAG side by side to verify the compression ratios yourself.
| File | Size | What it is |
|---|---|---|
| NodeMind BGE-M3 Index (32×) | 64 MB | Binary fingerprints + index metadata |
| Float32 RAG Index (baseline) | 2,048 MB | Raw float32 embeddings — verify the 32× yourself |
| HNSW Size Reference | <1 KB | HNSW = float32 × 1.5× overhead — explains the 48× number |
| NodeMind BGE-base 256-bit (96×) | 16 MB | PCA-compressed binary — verify the 96× yourself |
| Corpus | ~144 MB | 500K text chunks (shared by all indexes) |
| Benchmark PDF | ~2 MB | Full methodology and results report |
Full interactive benchmark page: nodemind.space/benchmark
import pickle
# Load NodeMind index
with open("nm_bgem3_index.pkl", "rb") as f:
nm = pickle.load(f)
# nm["fps"] — (500000, 128) uint8 = 64 MB binary fingerprints
# nm["ctv"] — (1024,) float32 = index metadata
# Load float32 RAG index
with open("rag_bgem3_index.pkl", "rb") as f:
rag = pickle.load(f)
# rag["embeddings"] — (500000, 1024) float32 = 2,048 MB
nm_bytes = nm["fps"].nbytes
rag_bytes = rag["embeddings"].nbytes
print(f"NodeMind : {nm_bytes / 1e6:.0f} MB")
print(f"Float32 : {rag_bytes / 1e6:.0f} MB")
print(f"Ratio : {rag_bytes // nm_bytes}×") # → 32
# BGE-base 256-bit (96×)
with open("nm_bgebase256_index.pkl", "rb") as f:
nm96 = pickle.load(f)
# nm96["fps"] — (500000, 32) uint8 = 16 MB
# float32 baseline for BGE-base = 500000 × 768 × 4 = 1,536 MB → 96×import numpy as np
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-m3")
with open("corpus.pkl", "rb") as f:
corpus = pickle.load(f)
chunks = corpus["chunks"] # list of 500,000 strings
POPCOUNT = np.array([bin(i).count('1') for i in range(256)], dtype=np.int32)
fps = nm["fps"]
def query_nodemind(text, top_k=5):
emb = model.encode([text], normalize_embeddings=True)[0]
# binarisation uses index metadata — details in the patent
q_fp = _binarise(emb, nm)
dists = POPCOUNT[np.bitwise_xor(fps, q_fp[np.newaxis, :])].sum(axis=1)
top = np.argsort(dists)[:top_k]
return [(int(dists[i]), chunks[i][:120]) for i in top]
results = query_nodemind("What is quantum entanglement?")
for dist, text in results:
print(f" [{dist:4d}] {text}")The _binarise function uses the index metadata stored in the pkl file. The full binarisation method is covered under AU 2026901656 — the index is self-contained and works without reading the patent.
Text is chunked and embedded with a sentence embedding model (BGE-M3 or BGE-base), producing a high-dimensional float32 vector per chunk.
Each embedding is converted to a compact binary fingerprint using the index's pre-computed metadata vector.
The result is 1024 bits (128 bytes) per chunk for BGE-M3, or 256 bits (32 bytes) with BGE-base PCA.
The binarisation is integer-only — no learned projection, no GPU needed at query time.
(The exact method is patent-protected — AU 2026901656.)
The binary fingerprints are stored in a Multi-Index Hash structure. At query time, candidates are found in matching hash buckets and re-ranked by full Hamming distance. Pure integer arithmetic, runs on any CPU.
(The MIH structure follows Norouzi et al. CVPR 2012. The novel contribution — CTV binarisation and portable single-file format — is covered under AU 2026901657.)
- BGE-M3 float32 → binary: 32× vs float32, 48× vs HNSW
- BGE-base + PCA to 256-bit → binary: 96× vs float32
The index is a single portable .pkl file. No server, no Docker, no external DB.
- Self-retrieval benchmark. Queries are perturbed versions of corpus chunks — optimistic for binary methods. End-to-end QA accuracy on BEIR / MS MARCO has not yet been measured; results may differ on out-of-distribution queries.
- HNSW comparison is index-size only. Real FAISS HNSW achieves recall@10 of 0.95–0.99 on most corpora. NodeMind achieves recall@10 of 1.000 on this benchmark, but this is a self-retrieval test — not a direct head-to-head on a neutral held-out set.
- 96× requires BGE-base + PCA-256. If you need BGE-M3 (stronger cross-lingual model), you get 32×/48×. The 96× path uses a lighter model.
- Corpus is text-only. Tables, code, and multi-modal documents were not tested.
- Float32 RAG download is 2 GB. Budget the bandwidth if you want to verify baseline sizes.
- AU 2026901656 — WHT Integer Codec (integer-only binarisation without learned projection)
- AU 2026901657 — NodeMind Centroid MIH (CTV-based binary fingerprinting + MIH search)
Filed at IP Australia, May 2026. Built solo in Coleambally, regional NSW, Australia.
- Live demo: nodemind.space
- Benchmark page: nodemind.space/benchmark
- X/Twitter: @Qlnix4E49